805 research outputs found
Deep Reinforcement Learning for Swarm Systems
Recently, deep reinforcement learning (RL) methods have been applied
successfully to multi-agent scenarios. Typically, these methods rely on a
concatenation of agent states to represent the information content required for
decentralized decision making. However, concatenation scales poorly to swarm
systems with a large number of homogeneous agents as it does not exploit the
fundamental properties inherent to these systems: (i) the agents in the swarm
are interchangeable and (ii) the exact number of agents in the swarm is
irrelevant. Therefore, we propose a new state representation for deep
multi-agent RL based on mean embeddings of distributions. We treat the agents
as samples of a distribution and use the empirical mean embedding as input for
a decentralized policy. We define different feature spaces of the mean
embedding using histograms, radial basis functions and a neural network learned
end-to-end. We evaluate the representation on two well known problems from the
swarm literature (rendezvous and pursuit evasion), in a globally and locally
observable setup. For the local setup we furthermore introduce simple
communication protocols. Of all approaches, the mean embedding representation
using neural network features enables the richest information exchange between
neighboring agents facilitating the development of more complex collective
strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20
Guided Deep Reinforcement Learning for Swarm Systems
In this paper, we investigate how to learn to control a group of cooperative
agents with limited sensing capabilities such as robot swarms. The agents have
only very basic sensor capabilities, yet in a group they can accomplish
sophisticated tasks, such as distributed assembly or search and rescue tasks.
Learning a policy for a group of agents is difficult due to distributed partial
observability of the state. Here, we follow a guided approach where a critic
has central access to the global state during learning, which simplifies the
policy evaluation problem from a reinforcement learning point of view. For
example, we can get the positions of all robots of the swarm using a camera
image of a scene. This camera image is only available to the critic and not to
the control policies of the robots. We follow an actor-critic approach, where
the actors base their decisions only on locally sensed information. In
contrast, the critic is learned based on the true global state. Our algorithm
uses deep reinforcement learning to approximate both the Q-function and the
policy. The performance of the algorithm is evaluated on two tasks with simple
simulated 2D agents: 1) finding and maintaining a certain distance to each
others and 2) locating a target.Comment: 15 pages, 8 figures, accepted at the AAMAS 2017 Autonomous Robots and
Multirobot Systems (ARMS) Worksho
Variational inference for policy search in changing situations
Many policy search algorithms minimize the Kullback-Leibler (KL) divergence to a certain
target distribution in order to fit their policy. The commonly used KL-divergence forces the resulting
policy to be ’reward-attracted’. The policy tries to reproduce all positively rewarded experience
while negative experience is neglected. However, the KL-divergence is not symmetric
and we can also minimize the the reversed KL-divergence, which is typically used in variational
inference. The policy now becomes ’cost-averse’. It tries to avoid reproducing any negatively-rewarded experience while maximizing exploration. Due to this ’cost-averseness’ of the policy, Variational Inference for Policy Search (VIP) has several interesting properties. It requires no kernelbandwith nor exploration rate, such settings are
determined automatically by the inference. The algorithm meets the performance of state-of-theart
methods while being applicable to simultaneously learning in multiple situations. We concentrate on using VIP for policy search in robotics. We apply our algorithm to learn dynamic counterbalancing of different kinds of
pushes with human-like 2-link and 4-link robots
Pressure Calculation in Polar and Charged Systems using Ewald Summation: Results for the Extended Simple Point Charge Model of Water
Ewald summation and physically equivalent methods such as particle-mesh
Ewald, kubic-harmonic expansions, or Lekner sums are commonly used to calculate
long-range electrostatic interactions in computer simulations of polar and
charged substances. The calculation of pressures in such systems is
investigated. We find that the virial and thermodynamic pressures differ
because of the explicit volume dependence of the effective, resummed Ewald
potential. The thermodynamic pressure, obtained from the volume derivative of
the Helmholtz free energy, can be expressed easily for both ionic and rigid
molecular systems. For a system of rigid molecules, the electrostatic energy
and the forces at the atom positions are required, both of which are readily
available in molecular dynamics codes. We then calculate the virial and
thermodynamic pressures for the extended simple point charge (SPC/E) water
model at standard conditions. We find that the thermodynamic pressure exhibits
considerably less system size dependence than the virial pressure. From an
analysis of the cross correlation between the virial and thermodynamic
pressure, we conclude that the thermodynamic pressure should be used to drive
volume fluctuations in constant-pressure simulations.Comment: RevTeX, 19 pages, 2 EPS figures; in press: Journal of Chemical
Physics, 15-August-199
Fitted Q-iteration by advantage weighted regression
Recently, fitted Q-iteration (FQI) based methods have become more popular due
to their increased sample efficiency, a more stable learning process and the higher
quality of the resulting policy. However, these methods remain hard to use for continuous
action spaces which frequently occur in real-world tasks, e.g., in robotics
and other technical applications. The greedy action selection commonly used for
the policy improvement step is particularly problematic as it is expensive for continuous
actions, can cause an unstable learning process, introduces an optimization
bias and results in highly non-smooth policies unsuitable for real-world systems.
In this paper, we show that by using a soft-greedy action selection the policy
improvement step used in FQI can be simplified to an inexpensive advantage weighted
regression. With this result, we are able to derive a new, computationally
efficient FQI algorithm which can even deal with high dimensional action spaces
On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning
Improved state space models, such as Recurrent State Space Models (RSSMs), are a key factor behind recent advances in model-based reinforcement learning (RL).
Yet, despite their empirical success, many of the underlying design choices are not well understood.
We show that RSSMs use a suboptimal inference scheme and that models trained using this inference overestimate the aleatoric uncertainty of the ground truth system.
We find this overestimation implicitly regularizes RSSMs and allows them to succeed in model-based RL.
We postulate that this implicit regularization fulfills the same functionality as explicitly modeling epistemic uncertainty, which is crucial for many other model-based RL approaches.
Yet, overestimating aleatoric uncertainty can also impair performance in cases where accurately estimating it matters, e.g., when we have to deal with occlusions, missing observations, or fusing sensor modalities at different frequencies.
Moreover, the implicit regularization is a side-effect of the inference scheme and not the result of a rigorous, principled formulation, which renders analyzing or improving RSSMs difficult.
Thus, we propose an alternative approach building on well-understood components for modeling aleatoric and epistemic uncertainty, dubbed Variational Recurrent Kalman Network (VRKN).
This approach uses Kalman updates for exact smoothing inference in a latent space and Monte Carlo Dropout to model epistemic uncertainty.
Due to the Kalman updates, the VRKN can naturally handle missing observations or sensor fusion problems with varying numbers of observations per time step.
Our experiments show that using the VRKN instead of the RSSM improves performance in tasks where appropriately capturing aleatoric uncertainty is crucial while matching it in the deterministic standard benchmarks
"The numerical accuracy of truncated Ewald sums for periodic systems with long-range Coulomb interactions"
Ewald summation is widely used to calculate electrostatic interactions in
computer simulations of condensed-matter systems. We present an analysis of the
errors arising from truncating the infinite real- and Fourier-space lattice
sums in the Ewald formulation. We derive an optimal choice for the
Fourier-space cutoff given a screening parameter . We find that the
number of vectors in Fourier space required to achieve a given accuracy scales
with . The proposed method can be used to determine computationally
efficient parameters for Ewald sums, to assess the quality of Ewald-sum
implementations, and to compare different implementations.Comment: 6 pages, 3 figures (Encapsulated PostScript), LaTe
Formation of Polymorphic Cluster Phases for Purely Repulsive Soft Spheres
We present results from density functional theory and computer simulations
that unambiguously predict the occurrence of first-order freezing transitions
for a large class of ultrasoft model systems into cluster crystals. The
clusters consist of fully overlapping particles and arise without the existence
of attractive forces. The number of particles participating in a cluster scales
linearly with density, therefore the crystals feature density-independent
lattice constants. Clustering is accompanied by polymorphic bcc-fcc
transitions, with fcc being the stable phase at high densities.Comment: 4 pages, 5 figures, submitted to Phys. Rev. Let
Hierarchical relative entropy policy search
Many real-world problems are inherently hierarchically
structured. The use of this structure
in an agent’s policy may well be the
key to improved scalability and higher performance.
However, such hierarchical structures
cannot be exploited by current policy
search algorithms. We will concentrate on
a basic, but highly relevant hierarchy — the
‘mixed option’ policy. Here, a gating network
first decides which of the options to execute
and, subsequently, the option-policy determines
the action.
In this paper, we reformulate learning a hierarchical
policy as a latent variable estimation
problem and subsequently extend the
Relative Entropy Policy Search (REPS) to
the latent variable case. We show that our
Hierarchical REPS can learn versatile solutions
while also showing an increased performance
in terms of learning speed and quality
of the found policy in comparison to the nonhierarchical
approach
- …